Character-based Collocation for Mandarin Chinese
نویسندگان
چکیده
This paper describes a characters-based Chinese collocation system and discusses the advantages of it over a traditiolml word-based systcm. Since wordbreaks are not conventionally marked in Chinese text corpora, a character-based collocation system has the dual advantages of avoiding pre-proccssing distortion and directly accessing sub-lexical information. Furthermore, word-based collocational properties can be obtained through an auxiliary modttle of automatic segmentation. corpora as they are, we ",viii be able to access sub-lexical information without additional cost. To take the full advantage of the nature of texts, reliable tools can also be devised to obtain [exical collocation. In this paper, we ,,viii describe the design and implementation e r a Chinese collocational system that does not require the preprocessing of automatic segmentation but is awe to allow both lexical and sub-lexical information be automatically extracted.
منابع مشابه
Deriving Conceptual Structures from Sense: A Study of Near Synonymous Sensation Verbs
In Mandarin Chinese, lexical semantic relation of near synonyms is a widespread phenomenon, and is of great interest to many linguists. Most works deal with lexical semantic relation between lexical entries. This paper investigates the differences between Chinese near synonymous sensation verbs based on the data from “Academia Sinica Balanced Corpus of Modern Mandarin Chinese” (Sinica Corpus) a...
متن کاملTransitivity in Light Verb Variations in Mandarin Chinese - A Comparable Corpus-based Statistical Approach
This paper adopts a comparable corpus-based approach to light verb variations in two varieties of Mandarin Chinese and proposes a transitivity (Hopper and Thompson 1980) based theoretical account. Light verbs are highly grammaticalized and lack strong collocation restrictions; hence it has been a challenge to empirical accounts. It is even more challenging to consider their variations between d...
متن کاملComparative Study in Mandarin Square badge designs between Ilkhanid and Timurid garments with Yuan and Ming Chinese garments
With the conquest of China and Iran by the Mongols, the influence of Chinese styles and methods appeared in all the visual arts, including the patterns of fabrics. These designs were also used on the clothes of those in power, which was of special importance in different periods and was considered a royal emblem. Mandarin square is one of the royal symbols. This Chinese royal emblem was also us...
متن کاملModeling Pronunciation Variation for Bi-Lingual Mandarin/Taiwanese Speech Recognition
In this paper, a bi-lingual large vocaburary speech recognition experiment based on the idea of modeling pronunciation variations is described. The two languages under study are Mandarin Chinese and Taiwanese (Min-nan). These two languages are basically mutually unintelligible, and they have many words with the same Chinese characters and the same meanings, although they are pronounced differen...
متن کاملModeling Chinese Documents with Topical Word-Character Models
As Chinese text is written without word boundaries, effectively recognizing Chinese words is like recognizing collocations in English, substituting characters for words and words for collocations. However, existing topical models that involve collocations have a common limitation. Instead of directly assigning a topic to a collocation, they take the topic of a word within the collocation as the...
متن کامل